Including Dialects and Language Varieties in Author Profiling
نویسندگان
چکیده
This paper presents a computational approach to author profiling taking gender and language variety into account. We apply an ensemble system with the output of multiple linear SVM classifiers trained on character and word ngrams. We evaluate the system using the dataset provided by the organizers of the 2017 PAN lab on author profiling. Our approach achieved 75% average accuracy on gender identification on tweets written in four languages and 97% accuracy on language variety identification for Portuguese.
منابع مشابه
Linguistic Audit as a Professional Activity
The subject of this research is linguistic (or: language) audit. The term is new and not being widely used so far. Linguistic audit, in particular, is offered as a service of linguistic-consulting agencies’ activities. Modern linguistic consulting, according to the author, is a form of stimulating theoretical and practical development of linguistic ecology, a new branch of applied linguistics, ...
متن کاملThe Short Vowels /i/ and /u/ in Iranian Balochi Dialects
The aim of the present paper is to study the status of the short vowels /i/ and /u/ in five selected Iranian Balochi dialects. These dialects are spoken in Sistan (SI), Saravan (SA), Khash (KH), Iranshahr (IR), and Chabahar (CH) regions located in province Sistan va Baluchestan in the southeast of Iran. This study investigates whether these two vowels have the same qualities as the short /i/ an...
متن کاملA Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملA Historical and Comparative Study of the Ergative Verb Structure in Ardakani, Dashti, Dashtaki, Yazdi Jewish and Lari Dialects
Introduction A dialect is a variety of a language used by group of people whose lexicon, syntax, phonetics and phonology are different from those of other people. The existence of many geographical, economic and social barriers among the speakers of a language cause the emergence of many dialects. As such, each language has many dialects and accents and each dialect has many different ac...
متن کاملSentence-level dialects identification in the greater China region
Identifying the different varieties of the same language is more challenging than unrelated languages identification. In this paper, we propose an approach to discriminate language varieties or dialects of Mandarin Chinese for the Mainland China, Hong Kong, Taiwan, Macao, Malaysia and Singapore, a.k.a., the Greater China Region (GCR). When applied to the dialects identification of the GCR, we f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1707.00621 شماره
صفحات -
تاریخ انتشار 2017